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Abstract 


Spiking neural networks, based on biologically-plausible neurons with temporal infor¬ 
mation coding, are provably more powerful than widely used artihcial neural networks 
based on sigmoid neurons (ANNs). However, training them is more challenging than 
training ANNs. Several methods have been proposed in the literature, each with its 
limitations: SpikeProp, NSEBP, ReSuMe, etc. And setting numerous parameters of 
spiking networks to obtain good accuracy has been largely ad hoc. 

In this work, we used automated algorithm conhguration tools to determine opti¬ 
mal combinations of parameters for ANNs, artihcial neural networks with components 
simulating glia cells (astrocytes), and for spiking neural networks with SpikeProp 
learning algorithm. This allowed us to achieve better accuracy on standard datasets 
(Iris and Wisconsin Breast Cancer), and showed that even after optimization aug¬ 
menting an artihcial neural network with glia results in improved performance. 

Guided by the experimental results, we have developed methods for determining 
values of several parameters of spiking neural networks, in particular weight and out¬ 
put ranges. These methods have been incorporated into a SpikeProp implementation. 
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Chapter 1 


Introduction 


Recent years have brought us numerous advances in artihcial intelligence (AI) and 
machine learning. With AlphaGo beating the world’s Go champion Lee Sedol in 2016, 
self-driving cars and trucks being tested on the roads, and a proliferation of voice- 
based virtual assistants, AI is becoming a household word. Much of this renaissance 
of AI seems to be due to algorithmic advances in training artihcial neural networks, 
in particular deep neural networks. That started in earnest from the celebrated paper 
by Hinton, Osindero and Teh [HOT06], which presented a method to pre-train deep 
neural networks one layer at a time, and paved the way for widespread industrial use 
of deep neural networks. 

However powerful deep neural networks seem to be, they are still second-generation 
neural networks, with each neuron computing a logistic/sigmoidal function. Though 
better than neurons computing a linear threshold function of its inputs in hrst- 
generation neural networks, sigmoidal neurons still lack the power of third-generation, 
most biologically plausible type of neurons: the spiking neurons. There, the output 
of a function computed by each neuron depends not only the value of its inputs, but 
also on the time each input arrived. Such temporal coding allows a spiking neuron to 
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compute provably more than its sigmoidal counterpart can [Maa97]. Moreover, spik¬ 
ing neurons are much more closely related to actual biological neurons, with similar 
models describing their behaviour. 

However, the power of spiking neural networks comes at a price. Whereas it is 
well understood how to train a second-generation neural network, with the backprop- 
agation algorithm being the gold standard, the question of training a spiking neural 
network is still wide open. In unsupervised learning scenario the commonly used ap¬ 
proach is STDP; spike timing dependent plasticity spike-timing dependent plasticity, 
where the connection between two neurons strengthen whenever hrst (pre-synaptic) 
neuron emits a spike right before the second one, and weakens when the first neuron 
hres right after the second (and so its spike cannot contribute to the second neuron’s 
output). This is a time-dependent extension of the Hebb’s "neurons that hre together 
wire together" rule. Though appropriate for unsupervised learning setting (for ex¬ 
ample, for clustering data), this approach is not as suitable for supervised learning 
(a classihcation task where the network is first trained on labelled data). There is a 
number of proposed algorithms (see section 2.3.2 for more details), however most of 
them have signihcant disadvantages such as only being able to output one spike per 
neuron, or being not suitable for training multi-layer networks. 

Even if we focus on a specihc training algorithm such as SpikeProp [BKLP02], with 
its advantages and disadvantages, still another problem arises. Second-generation ar¬ 
tificial neural networks have a relatively small and manageable number of parameters 
(number of layers and neurons at each layer, learning rate, number of training epochs, 
regularization whenever it is used). In addition to that, a spiking neural network has 
a number of extra parameters coming from the model describing the behaviour of the 
neurons: time constants, coding interval, threshold, time step, nnmber and length of 
delays (when there are multiple delayed synapses between neurons), etc. There seems 
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to be no standard method for selecting even the most crucial set of parameters: target 
output times, the goal for the training... 

1.1 Our results 

In this thesis, we study the problem of conhguring spiking network training algo¬ 
rithms, in particular SpikeProp. As a starting point, we used SpikeProp code for 
the XOR problem available on GitHub [mba], however that code required numerous 
changes. Once we had a working version of SpikeProp, we used automated parameter 
tuner, ParamILS [HHLBS09], to hud a better set of parameters for two benchmark 
classihcation data sets, Wisconsin breast cancer (WBC) and Iris. This allowed us to 
achieve better accuracy on these data sets than what is reported in the literature. We 
also achieve better accuracy for second-generation artihcial neural networks, and arti- 
hcial neural networks augmented with glia cells (astrocytes), improving upon results 
of [Sajl4]. 

Then, we looked at the interplay between different parameters used in the Spike¬ 
Prop algorithm (and in general in spiking neural networks with delays), and their 
effect on the classihcation accuracy. We obtain the following results. 

Improved accuracy in ANNs vs. ANNs with glia Using automated algorithm 
conhguration (ParamILS), we were able to improve the performance of artihcial neural 
network (ANN) and artihcial neural networks with glia (ANGN) presented in [Sajl4] 
from 87% to 96% (ANN) and from 91% to 97% for WBG dataset, and from 80% 
to 90% (ANN), 86% to 96% (ANGN) for the Ionosphere data set (see table 4.1). 
In addition to better accuracy results, this adds more weight to Sajedinia’s hndings 
that adding glia (astrocytes) to an artihcial neural networks improves performance. 
Indeed, even after optimization artihcial neural networks with glia performed better 
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than plain ANNs. Thus, the improvement in accuracy with ANGNs over ANNs seems 
a genuine result of the change of the network structure. 

Improved accuracy for SpikeProp-trained spiking neural network. We were 
able to improve upon accuracy of SpikeProp-trained SNN reported in the original 
paper introducing SpikeProp, [BKLP02]. In particular, for Iris dataset we were able 
to achieve the accuracy of 98.1% (versus 96.1% of [BKLP02]), and for WBC dataset 
our parameter optimization improved the accuracy from 97.0% to 98.5%. 

Weight initialization bounds. We build upon the results of [MLB06] to give new 
formulas for initial weight range for each layer in the network. In contrast to [MLB06], 
our formulas do not involve target times. 

Interplay between membrane time constant and delays. From our exper¬ 
imental results there seem to be a linear dependency between the membrane time 
constant r and the number of synapses each with a different delay d corresponding 
to the region of best accuracy. In particular, for WBC dataset the best performance 
was achieved when the number of delays d was more than 1.6r -|- 1.9, and for Iris 
data set the good performance started from d > 0.96r — 0.21. For both data sets, 
we considered values of r starting from the length of the coding interval (that is, 
interval which contains the range of input data): it was shown in [Maa96] that this 
restriction is necessary for good performance. There is also a drop in performance 
when d becomes too large with respect to r. For WBC dataset and a permutation 
of the original Iris data set, there was a well-dehned line containing pairs (r, d) with 
optimal accuracy (see hgures 4.3, 4.9,4.7). 
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Target times and their ranges. For data sets such as WBC and Iris, it is the 
structure of the data that seems to influence selection of the target times the most. 
Even though the data is not linearly separable, even a simple heuristic of looking at 
the sum of the input values gives a pretty good estimation of the class for these data 
sets: see figures 4.10 and 4.11. Thus, it might be difficult to train a network to produce 
a spike for malignant class for the breast cancer data before a spike for benign class. 
For such data sets, the heuristic sometimes used in the literature [IlLQ+16] of setting 
the target time to an average of a few runs for each class on a randomly initialized 
untrained network seems to give selections similar to what we obtain with ParamILS. 

That said, we do need bounds on the output spike times, not only for the range 
of parameters to give to ParamILS, but also to determine when to stop running the 
network. For that, we develop formulas for the earliest and latest output spikes, based 
on the weight range bounds. 

1.2 Thesis organization 

In Section 2, we cover the basics of learning with artificial neural networks. Section 
3 gives an overview of spiking neuron models, and a survey of algorithms for training 
spiking neural networks. Section 4 covers methods for parameter tuning and auto¬ 
mated algorithm conhguration with ParamILS. The core of the thesis is Section 5, 
where we present our results. Finally, Section 6 summarises the work we have done 
and discusses some ongoing and future work. 
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Chapter 2 


Background 


In this section, we cover definitions of various types of neural networks and learning 
settings, with the focus on artificial neural networks most commonly used in practice: 
second-generation neural networks. 

2.1 Learning algorithms 

There are three different settings for machine learning that appear in practice, in 
particular in the context of learning with artificial, including spiking, neural networks. 
In this work we focus on the first one, supervised learning (classification). Though 
quite natural in the context of artificial neural networks, supervised learning is harder 
for spiking neural networks; biologically inspired mechanisms tend to unsupervised or 
reinforcement models. 

2.1.1 Supervised learning 

Supervised learning is a technique for predicting a label of a previously unseen instance 
from the prior knowledge about input and the target output [Davl3]. It can be viewed 
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as a machine learning task of inferring a function from training data which can be used 
for correctly mapping a class for unseen instances. Each training sample (consisting 
of an input vector X, and its corresponding target output vector Y) may feed into 
network several times so that the actual output can approach the target output. 
An error value is calculated from each given sample as a function of the difference 
between the target outputs vector, Y and the actual output vector, Z (for example, 
min square error). In neural networks, this error is used to update connection weights 
in the network, so that network can generate a result closer to or exactly the desired 
output next time if similar input pattern arrives. Two most common approaches to 
minimise this error in spiking neural networks are gradient descent rule and learning 
windows rule [XQYK17]. In order to minimise errors, gradient descent based learning 
algorithm Ends a local minimum of linear systems. The second type changes the 
synaptic weights as a function of the relative timing of pre- and postsynaptic action 
potentials. 

2.1.2 Unsupervised learning 

Unsupervised learning is a technique for exploring data to find some intrinsic struc¬ 
tures in the input under an unknown probability distribution. The convergence analy¬ 
sis of unsupervised learning is much more complicated than other learning as the input 
datasets are unlabelled. For traditional artihcial neural networks, a n-dimensional in¬ 
put is processed by the exact same number of computing units or by minimising 
a cost function for clustering, feature extraction, dimension reduction, and others. 
Self-organizing map (SOM), adaptive resonance theory (ART), principal component 
analysis (PCA), Independent Component Analysis (ICA), Hebbian learning and BCM 
rule are commonly used unsupervised learning algorithms for non-spiking neural net¬ 
works. 
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In case of spiking neural networks, Hebbian learning, BCM rule, and STDP rules 
are being successfully used as unsupervised learning methods in real-world applica¬ 
tions. STDP learning is an asymmetric form of Hebbian learning in a sense of tight¬ 
ening temporal correlations between weakening and strengthening the connection. 
Since unsupervised learning takes into account competition and lateral inhibition, the 
weights of the winner neurons (ones that emit the first spike) are increased while other 
neurons suffer a small weight reduction [IRMGM+IS]. Similarly, STDP learning rule 
considers the lateral inhibition between pairs of spikes: a pre-post pairing causes po¬ 
tentiation and a post-pre pairing causes depression. The most recent presynaptic and 
postsynaptic spike pair is used to detect the correlations of next attempting fire. 

2.1.3 Reinforcement Learning 

Reinforcement learning is a control optimization technique which is used to recognize 
the best action in every state visited by the system. In reinforcement learning a 
general error signal back (“reward”) is determined in every state that describes how 
well the system is performing. The typical framing of a reinforcement learning is the 
following scenario. An agent takes actions in an environment. Based on the action it 
changes state, and the learning algorithm also receives a reward signal a short time 
later. The current state and reward both are then fed back into the agent. The 
algorithm modihes its strategy in order to achieve the highest reward. 

2.2 Artificial Neural Networks 

Much of neural networks success story in recent years is due to adapting some powerful 
set of learning techniques in neural networks which is called deep learning. Intelligent 
adaptive control, decision support, complex systems identihcation, image compression. 



pattern recognition, optimization, signal processing, speech recognition, face recogni¬ 
tion and natural language processing is also driven by advances in neural networks 
and deep learning. Though a history of the neural network had made by William 
James in 1980 after a long time neurophysiologist Warren McCulloch and mathe¬ 
matician Walter Pitts first developed a simple neural network model in 1943 which 
was capable of computing any arithmetic or logical function [YadlS]. The founder of 
Neurocomputing Frank Rosenblatt invented the basic version of an artihcial neuron 
called perceptron. 

2.2.1 Backpropagation Algorithm 

Feedforward neural networks do not contain any feedback connections to the previ¬ 
ous or current layer. They can only pass information to next layer through neuron 
response function which is evaluated from input value X, and the intermediate con¬ 
nection weights, W and hnally reach the output neuron. Backpropagation rule is 
applied to network from output layer to consecutive previous layer to adjust weights 
so that the network can learn mapping arbitrary unseen inputs to exact outputs. How 
backpropagation algorithm works to train feedforward neural network has been shown 
by an example (see hgure 2.1). Here, we have shown feedforward network with one 
hidden layer H, two bias neurons { 81 , 82 ) and one output neuron that needs to be 
trained up to get target (0.99) result. By this example we will explain working pro¬ 
cedure of feedforward neural networks in four main steps: feedforward computation, 
backpropagation at the output layer, backpropagation at the hidden layer and weight 
updates. For more detailed explanation see, for example, this link: [Maz]. 
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Figure 2.1: Incorporated with feedforward neural network 


Feedforward Computation: 

To know the output of each neuron hrst we need to determine the weighted sum of 
all incoming inputs of a neuron. Suppose the weighted sum of incoming input of a 
neuron is iun and output is op at. 

Weighted sum of neuron Hi. inni = Wi * Xi + W 2 * X 2 + + Wg * Bi = 0.35 * 

0.5 + 0.3 * 0.8 + 0.25 * 0.3 + 0.35 * 1 = 0.84 

Output of neuron Hi. opni = ^ i+e-o-s^ = 0.69846 

Similarly; inH 2 = 0.69 and opH 2 = 0.66596 

inoi = 0.74566 and opoi = 0.67823 

Network error is determined by the following function of the difference between actual 
and desired output: 
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E = ^{desired — actual^ = \{dout — opoiY = ^(0.99 — 0.67823)^ = 0.04860 


Backpropagation at the output layer: 

A portion of the network error E is fed back to consecutive previous layer (see figure 
2.2) to change the connection weights so that the actual output and the target output 
can get closer. For this reason the chain rule is applied to know how much each 
connection has been affected according to network error. The partial derivative of E 
with respect to w^ describe the affected portion of that connection. 

&E _ SE ^ Sopoi ^ Sinpi 
Sw7 ^opQi SiriQi Sw7 



1 

Figure 2.2: Illustration of backpropagation algorithm. Adapted from [Maz] 


Step 1: 

E = \{dout - opoi? 

= -(0.99 - 0.67823) = -0.31177 

Sopol ^ ^ 

Step 2: 


opoi = 
Sopoi _ 

6 inQi 

^OPOI _ 
SiriQi 


_ 1 _ = (T -i- p-inoi)-! 

l+e-^^oi VJ- W c j 

-1(1 + e--oi)-2 ^ 4(i+g7Fl) = opoiil - opoi) 
0.67823 * (1 - 0.67823) = 0.21823 


Step 3: 
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inoi = Wj* OPHI +W8* 0PH2 + -02 * 1 


= opHi = 0.69846 


Putting it all together: 


5E 

Swj 


SE 


Sopoi 


Sinpi _ 

SW'Y 


* = -(0.31177) * 0.21823 * 0.69846 = -0.04752 


Sopoi Sinoi 

After combining abvove three step, we can generalised the chain rule: 
S = 4 I 7 * fe * ^ - ^^ 01 ) * opolil - opoi) * iUHl 

Rewrite the equation by replacing Sqi = {dout — opoi) * opoi{l — opoi) 


^ E c 

_ = * lUHl 


In general, If we denote the backpropagated error at the jth node by 6 j , and weight 
between node i and j by Wij; then we can say Awij = —p * opi * 5j {p = learning rate) 

Backpropagation at the hidden layer: 

SE _ 5E ^ Soutni ^ Snetni 
Swi Soutjji Snetni Swi 

The way has shown above is applied to consecutive previous layers with the partial 
derivative of the error E with respect to their weights. This process continues until 
the input layer is reached. 

Overall, the above process is repeated for every sample until the network converges. 


2.2.2 Augmenting ANNs with astrocytes 

Recent advances have revealed that there is another type of cells called astrocyte (the 
most abundant type of macroglial cell) which is affecting signaling on synapses as 
well as impacting on neuronal information processing [PSA14]. Although astrocytes 
cannot propagate action potential (AP) like neurons do, they release neuroactive 
substances to communicate over short distances called “gliotransmitters”. According 
to Pereira and Furlan [PFIO], astrocytes and neurons play a different vital role in the 
nervous system: astrocytes propagate a substance which is responsible for conveying 
"the feeling" and neurons carry information about "what happens" [Sajl4,PFIO]. 
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In 1985, Dr. Marian Diamond published anatomical studies of slivers of Einstein’s 
brain where she claimed that Einstein’s brain had a greater ratio of glial cells to neu¬ 
rons compared to a sample group of 11 other brains [DSMH85]. Research has shown 
that the density of astrocytes can vary from region to region in the central nervous 
system (CNS) approximately 20 to 40 percent. Glia astrocytes can communicate in 
a bi-directional manner with neurons and other astrocytes by releasing transmitters 
and propagating calcium spikes. The term "tripartite synapse" refers to a concept 
of coupling (see hgure 2.3) between astrocytes and neurons (pre and postsynaptic) 
which provides a pathway for chemical communication between the cells [WMH+ll]. 

A computational model of glia astrocyte named ANGN (artihcial neuron glia net¬ 
work) introduces by Pazos et al( [PGP09]) in 2009, where the connectionist systems 
was designed by adding an artihcial glia astrocyte with each neuron [Sajl3]. From 
the computational point of view, it is assumed that each astrocyte counts the number 
of bring and non-hring times of its associated neuron for a specihc period of time. 
Depending on active and inactive times of the neuron, the connected weights will be 
changed by a pre-dehned factor [Sajl4, AGPPP12]. The effect of astrocytes in such 
network can be defined by the following parameters: 

k: number of iterations (number of times each sample is fed into the network) 
a : weight increment rate 
b : weight decrement rate 

c : threshold on the number of times neuron has bred or not during k iterations. 
If a neuron remains active c times, the weights of the connections will be increased 
by the factor of a, while connection weight will be decreased by the factor of b if the 
neuron remained inactive c times during k iterations. 


13 



Presynaptic Neuron Postsynaptic Neuron 


i i 



Figure 2.3: Network consists of presynaptic neuron A, postsynaptic neuron B and an 
interconnecting tripartite synapse. Adapted from figure 3 in [WMH+11]. 

2.3 Spiking Neural Networks 

The network model which is comprised of spiking neurons is called spiking neural 
network (SNN). Designing a spiking neuron model starts with mimicking the charac¬ 
teristics of a biological neuron. Wolfgang Maass described a mathematical formation 
of a formal spiking neural network in ( [Maa95, Maa96]) [Maa97]. 

Each spiking neuron operates by integrating the voltage sum of all presynaptic 
spike in a time-dependent manner. It emits an output spike (see hgure 2.4 A) when the 
excitory post synaptic potential (EPSP), also called the membrane potential, reaches 
a threshold value. Thus, in a spiking neural network each neuron conveys information 
to the next layer by individual spike times. When a single presynaptic spike arrives 
from presynaptic neuron i to a postsynaptic neuron j at time ti, it generates a post 
synaptic potential (PSP) which reflect membrane potential (see hgure 2.4B). The 
membrane potential at current time t is dehned by V{t): 

V{t)= WijEjit - ti) (2.1) 

ier^ 
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Here, the kernel function e describes the post-synaptic potential (PSP) contribution 
by each incoming spike (see figure 2.4C), Pj is a set containing the spike times tj 
emitted by all the presynaptic neurons of the neuron i, and Wij are the corresponding 
weights. Lets {t — ti) = s; if s <= 0, then e{s) = 0. Otherwise, 


£(s) 


T 


( 2 . 2 ) 


where the membrane time constant r determines the rise and decay time of the PSP. 





Figure 2.4: (A)Each input pulse causes an excitatory postsynaptic potential (EPSP). 
(B)All EPSPs are added one after one. When it reaches the threshold voltage v, 
it generates an output spike. (C) The value of e(t) is maximum at membrane time 
constant r. Taken from [Boh]. 


A spiking neuron (see figure 2.5B) computes membrane potential at each point 
of time as a function of input spikes, while a sigmoidal neuron (see figure 2.5A) 
computes output voltage after receiving all input from the previous layer. Spiking 
neurons work with precise timing of spike, biologically more plausible, considerably 
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faster and computationally more powerful [Maa97]. The artificial sigmoid neuron 
replaces a sharp threshold with a smoother function with sigmoid or logistic and 
produces real-valued output bounded by (0,1). 




Figure 2.5: Response of neuron after receiving input. (A) The output pattern of a 
sigmoidal neuron. (B) The membrane potential of a spiking neuron. 


Spiking Neuron Models 

Over time, multiple models of a biological neuron have been proposed, of varying 
complexity. Here, we survey common computational models of a biological neu¬ 
ron, inlcuding hodgkin huxley model (HHM), integrate-and-hre model (IF), leaky 
integrate-and-hre model (LIF), and spike response model (SRM). 

2.3.1 Biological Neuron 

A neuron is an electrically excitable cell that receives, processes, and transmits infor¬ 
mation to other cells in the body through electrical and chemical signals. Biological 
neuron (see hgure 2.6) mainly consist of three major parts: dendrites by through 
information comes into the neuron from other neurons, cell body (soma), the main 
part of the neuron, which processes information and then passes it along to the third 
main part, an axon. The operation of nervous system depends on how well neurons 
communicate with each other. For an electrical signal to travel between two neurons, 
it usually must hrst be converted to the chemical signal. Then it crosses a space about 


16 











Presynaptic 

Neuron 


Axon 


Synapse 


I Body 


Nucleus 


Nucleus 


Postsynaptic 

Neuron 


Figure 2.6: Demonstration of main part of biological neuron 

one-millionth of an inch. This space is called synapse and the chemical signal is called 
neurotransmitter. Neurotransmitters allows billions of neurons in the nervous sys¬ 
tem to communicate with each other. Though number of encoding strategies such as 
binary coding, rate coding, latency coding, fully temporal codes, population coding, 
predictive spike-coding, probabilistic spike-coding, etc are being used to understand 
the responses to stimuli, the two most popular types of encoding methods are the 
frequency of spiking (rate coding) and the bring times of spikes (pulse or temporal 
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coding). 

In the algorithms we consider, information is encoded in the precise spike firing 
time [XZHY13]. If the ranges of input values are very different, then each feature 
value is often mapped to a high-dimensional space by dehning some function such 
as Gaussian RFs or square cosine encoder. In our experiments, we do not use such 
encodings, instead setting spike times directly to the (normalized) input values. 

Integrate-and-Fire model 

The neuron model where the action potentials (very rapid change in membrane po¬ 
tential when a cell membrane is stimulated) are described as events is called the 
integrate-and-hre model. This model is represented by a linear differential equation, 
which is the time derivative of the law of capacitance, Q = CV. 

= 4 P-3) 

When an input current I{t) is applied, the membrane capacitor is charged with time 
until it reaches a constant threshold voltage Vth, at which point a sharp electrical pulse 
called spike is triggered and the voltage is reset to its resting potential. The bring 
frequency of the model increases linearly without bound as input current increases. 
By introducing a refractory period tref, we can limit bring frequency of a neuron 
by preventing it from bring during that period. The limitation of integrate-and-bre 
model is that if it receives a below threshold signal at some time, it will retain that 
voltage boost forever until it bres again. 
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Figure 2.7: Integrate-and-Fire model 


Leaky Integrate-and-Fire model (LIF) 

The leaky integrate-and-fire model neuron is a commonly used spiking neuron model, 
owing to its relative simplicity and analytical tractability. The limitation of the 
integrate-and-hre model is solved in Leaky Integrate-and-Fire model by adding a 
term "leaky integrator" to the membrane potential. The basic circuit of an integrate- 
and-hre model (see Fig.2.7) consists of a capacitor C in parallel with a resistor R 
driven by a current I{t). The driving current can be split into two components. 


/(() = Ia + Io 


(2.4) 



(2.5) 


Equation 2.5 derives from equation 2.4: multiply equation 2.4 by R and introduce 
the time constant = RC of the ‘leaky integrator’. This yields the standard form 
of Leaky Integrate-and-Fire model . We refer to v{t) as the membrane potential at 
time t and to as the membrane time constant of the neuron. 
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v(t) + RI(t). 


( 2 . 6 ) 


Tmdv- = - 

In a simulation, spiking events of LIF model are used in a slightly different way. 
The membrane potential can be described explicitly by the constant input current, 
time-varying input current and synaptic currents. In a more realistic case, spikes are 
characterized by a bring time where the neuron is stimulated by pre-synaptic spikes 
arriving at its synapses [GK02b]. Post-synaptic current to the ith neuron at time t is: 

I{t) = (2.7) 

i f 

Spike Response Model (SRM) 

The spike response model (SRM) is a generalised version of the leaky integrate-and- 
bre model. It includes a new term "refractoriness", which describes the temporary 
inability of generating new spikes immediately after bring. 

The SRM produces very good predictions of the target spike trains over a broad 
range of means and standard deviations of the injected current [JTG03]. Suppose 
that the neuron has bred its last spike at time t. At each time t > t, the state of this 
spiking neuron at time t is represented by u{t)\ 

+00 

u{t) = Tfit — t) + J k{t — t, s)I{t — s)ds (2.8) 


Here, kernel k is the linear response to an input pulse. The form of the action potential 
and the after-potential is described by a function rj. Function ri(t — U) debnes the 
refactory period of a neuron and the last term accounts for the ebect of an external 
current I(t). 
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SRMq models 


SRMq models is a simplified version of the Spike Response Model. In SRMq models 
the state of spiking neuron at time t is represented by u{t)\ 

OO 

Uiit) = ri{t - ii) + - tj) + f ko{s)R''\t - s)ds (2.9) 

j b 0 

r]{s) = 6{s)-rioexp{—-) (2.10) 


eo(s) = _^^ (exp(-^) - exp(-^)) (2.11) 

‘ m 

Where, 

s is the time after the arriving spike 
Tc is the current time constant. 

Tm is the membrane time constant. 

Hodgkin—Huxley model 

Hodgkin-Huxley model (see hgure 2.8) describes the current at the neuron in terms of 
of three types of channels: a sodium channel with index Na, a potassium channel with 
index K and an leakage channel with index L. This is the most biologically realistic 
model, but it is computationally more difficult to simulate. 

I = C^^+hon ( 2 . 12 ) 


where Cm is the membrane capacitance per unit, Vm is the intracellular membrane 
potential, Ron is the sum of all ion channels current, and I is the externally applied 
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Figure 2.8: Hodgkin-Huxley model 

current. The most significant advantage of the above equations is to determine the 
membrane capacitance in such a way that is independent of the sign or magnitude of 
the intracellular membrane potential and minimally affected by the time course of Vm- 

Here, 

lion = gNA{Vm — Vna) + OkiVni — Vk) + “ Vl) (2-13) 

where 

Qna = sodium conductances per unit area 
qk = potassium conductances per unit area 
ql = leak conductance conductances per unit area 
Vna = sodium reversal potentials 
Vk = potassium reversal potentials 
ql = leak reversal potential 

Hodgkin-Huxley model is the most biologically realistic model, but it is computa¬ 
tionally difficult to simulate. 
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Izhikevich model 


Izhikevich [Izh03] attempts to create a more computationally efficient version of 
Hodgkin-Huxley model by using a system of differential equations with a quadratic 
function of the membrane potential v. He uses an auxiliary variable u to represent 
the negative feedback from activation of K~^ and inactivation of Na'^ channels. The 
derivatives are taken with respect to time, and a, b, c, d are parameters. The variable 
I denotes the external current. 

v' = 0.04u^ + 5u + 140 — u + I 
u' = a{hv — u) 

if u > 30ml/, then v = c, and u = u + d 

This nenronal model is capable of reproducing a variety of spiking patterns, in¬ 
cluding RS (regular spiking), IB (intrinsically bursting), and mixed mode bring pat¬ 
ters, FS (fast spiking), TC (thalamo-cortical), RZ (resonator) and LTS (low-threshold 
spiking) [Izh03]. 

2.3.2 Supervised learning with spiking neural networks 

Several learning algorithms have been proposed to train spiking nenrons. In terms on a 
nnmber of spikes snpervised learning algorithm can be subdivided into two categories: 
single spike and multiple spikes. 

Survey of supervised learning algorithms for SNNs 

S.M Bohte et ah [BKLP02] proposed a gradient descent-based supervised learn¬ 
ing algorithm called SpikeProp, that transfers information in the timing of single 
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spike [BKLP02]. Sam McKennoch et al. (2006) made some modifications on Spike- 
Prop by adding momentum and has shown faster convergence on his proposed mod- 
ihed algorithm QuickProp and RProp (Resilient Propagation) [MLB06]. But still, 
these error backpropagation algorithms were limited in that they did not allow mul¬ 
tiple spikes, which made them less realistic to perform a more complex computation. 
Weight limit learning algorithm is another simplihed version of SpikeProp algorithm 
has been proposed by Q.X.Wu et al. (2006) [WMM+06]. Inspired by biological neu¬ 
rons, a weight limit constraint is applied on SpikeProp to ensure that all neurons hre 
at least once in a simulation period. The training accuracy depends on the proper 
adjustment of synaptic weights of the network, such that the actual outputs are close 
to target outputs. Since the network error is calculated as a function of the time 
difference between actual and target bring of a neuron, special learning rules need to 
be added if a neuron does not bre during the simulation period. 

Later, Y. Xu et al. [XZHY13] introduced another gradient descent based super¬ 
vised multi-spike learning algorithm, where error function has been constructed con¬ 
sidering the same number of output spikes dynamically as of target spikes in each 
learning epoch. Tempotron is a gradient-descent-based single layer supervised learn¬ 
ing algorithm introduced by Gutig and Sompolinsky. It only works for binary classi- 
bcation, though it does allow multiple spikes [GS06]. 

An error back propagation multi-spike supervised learning named normalized spik¬ 
ing error back propagation (NSEBP) [IILQ’''16] is a multilayer training algorithm. 
Unlike SpikeProp, QuickProp, and RProp, there are no delays in NSEBP network. 
The computational error is propagated back to previous layers by presynaptic spike 
jitter instead of the traditional gradient-descent rule. In the feedforward calculation, 
the time boundaries are determined to avoid the action potential going inbnitesimal. 
Presynaptic spikes between this time boundaries are selected for hidden layer neuron. 
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Random spikes are added in a hidden layer if there is no spike generated between 
this range. For improving training efficiency only the output voltage is determined at 
target time and ignore the state of other times. This learning approach adopted with 
STDP has been described by same authors in accurate synaptic efficiency adjustment 
(ASA) supervised training method [XQYK17]. 

Ponulak proposed a remote supervised method (ReSuMe) for a spiking neuron 
[PonOS], and it has been revised to train SNNs afterward [KP05, Pon06, KPK06, 
PBR08] [PAIO]. The fact behind the naming "remote supervised method (ReSuMe)" 
is that the target signal does not directly influence the membrane potential of the cor¬ 
responding learning neuron [XZZ13]. Instead of gradient descent method, ReSuMe is 
motivated by Widrow-Hoff rule and it minimizes the error based on the interaction 
between two spike-timing-dependent plasticity processes instead of gradient calcula¬ 
tion. An anti-STDP process is applied for weakening synapses while the input spike 
trains are co-related with the actual spike trains and STDP process for strengthen¬ 
ing synapses while the input spike trains are co-related with the target spike trains 
respectively [XZHY13]. 

A new concept for training spiking neural network based on plasticity window, 
called synaptic weight association training (SWAT), was published in 2010 by John 
J. Wade, Liam J. McDaid, Jose A. Santos, and Heather M. Sayers [WMSSIO]. There, 
STDP is combined with Bienenstock-Cooper-Munro (BCM) theory, where a single 
neuron is used as a training neuron and data associated with all class is passed to this 
neuron. A sliding threshold associated with BCM model controls long term depression 
(LTD) over long term potentiation (LTP) of STDP so that the action potential of the 
neuron cannot go to inhnitesimal. 

Spike-based processes, such as Long-term potentiation (LTP), long-term depres¬ 
sion (LTD) and spike-timing dependent plasticity (STDP), are being widely used 
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as Unsupervised learning for spiking neural network. It has been investigated and 
explored in the literature ( [BSA89, GK02a, GKvHW96, MLFS97, KvRST02, Kis02]). 
[KP06] 

The summarized view of supervised learning algorithms with spike time coding of 
SNN is shown below: 
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Algorithm 

Model 

Encodings 

Summary 

References 

SpikeProp 

SRM; multi¬ 
layer; with 

delay 

Gaussian 

RF; single 

spike 

No random noise, back propaga¬ 
tion,gradient descent 

[BKLP02] 

Learning 

under 

weight 

constraints 

SRM; multi¬ 
layer; with 

delay 

Square 

cosine 

RF; single 

spike 

weight limit constraints are applied 

to SpikeProp algorithm. 

[WMM+06] 

NSEBP 

SRMq; multi¬ 
layer; no de¬ 
lay 

Gaussian 

RF; multi¬ 
ple spikes 

Weight from i/p to hidden are only 

adjusted and o/p neuron has weight 

1; use presynaptic spike jitter in¬ 
stead of gradient descent. 

[HLQ+16] 

ASA 

SRMq] multi¬ 
layer; no de¬ 
lay 

Gaussian 

RF; multi¬ 
ple spikes 

Learning is completed when volt¬ 
age of output neuron is equal to 

threshold; Weight from i/p to hid¬ 
den are only adjusted using normal¬ 
ized STDP rule 

[XQYK17] 

ReSuMe 

Izhikevich; 

single layer; 

with delay 

Single 

spike 

Learning proceeds by correlation of 

spike times instead of gradient de¬ 
scent; uses reference signal; learning 

depend on network size and learning 

window. 

[PonOS] 

[PAIO] 

SWAT 

LIF; multi 

layer; no 

delay 

Single 

spike 

Use a single training neuron in 

training phase; use combined STD- 

P/BGM training rule. 

[WMSSIO] 

Tempotron 

LIF; single 

layer; no 

delay 

Single 

spike 

27 

Gaussian noise added to all spike 

time; only work for binary classih- 

cation. 

[GS06] 





Figure 2.9: Spiking neural network with delay path 


2.3.3 SpikeProp 

In this section, we will demonstrate the procedure of training a multilayer spiking 
neural network with a error back propagation based learning algorithm which is ca¬ 
pable of learning complex nonlinear tasks. Connectionist system of our feedforward 
spiking neural network with multiple delayed synaptic terminal has been shown in 
hgure 2.9 

• Delayed sub-connections: 

Assume that there are k number of delay path way associated with each synaptic 
connection between neuron i to j. Let ti be the bring time of a pre-synaptic 
neuron i, and dk is the delay associated with the synaptic terminal k, then the 
membrane potential is written as follows: 


W = EE w-jEjit - - dk) (2.14) 

i&Fj k 

• Back propagation of the network error: 
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When the membrane potential reach to threshold voltage, a output spike is 
emitted from neuron j which goes to next layer (output neuron). The timing 
of emitted spike is called actual bring time. An error E is calculated by taking 
difference between target firing time tf' and actual firing time 

( 2 . 15 ) 

j 

The error is then back propagated to network and according to error of subse¬ 
quent layer the synaptic weights of associated layer are updated. 

• Weight modifications for the output neurons: 
we need to calculate 

A-t = ( 2 . 16 ) 

( 2 . 17 ) 

• Weight modifications for the hidden neurons 

= -vyi{u)5i ( 2 . 18 ) 
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Chapter 3 


Parameter Tuning 


Setting the parameters is one of the most challenging tasks in designing effective al¬ 
gorithms. To optimize the performance of a target algorithms, we look for expert 
experience, rules of thumb, or sometimes brute-force search for parameter conhgura- 
tion. Unfortunately, this manual parameter tuning approaches is often tedious and 
unsatisfactory. 

3.1 Manually Tuning 

In the famous Coursera ML course by Andrew Ng [Ng], it is suggested to look at 
the "learning curves", and compare the behaviour of the training error and the cross- 
validation error. This can indicate whether low accuracy is due to underhtting or 
overhtting. 

There are many possible options that could improve the performance of the learn¬ 
ing algorithm such as getting more training examples, smaller set of features, getting 
an additional feature, adding polynomial features, decreasing learning rate, and in¬ 
creasing learning rate. If the training set becomes larger and larger, the average 
training error will increase simultaneously while cross-validation error and testing er- 
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ror will decrease as most of the data fit to train set. Causes of poor performance of 
machine learning algorithms could be either underhtting or overhtting problem. 

High Bias (underfitting): High bias problem occurs when an algorithm made the 
wrong assumption between features and target outputs. A model falls into underht¬ 
ting if the algorithm is having a low variance but high bias problem. When a model 
experiences both high cross-validation error and high training error, it is a sign of 
the high bias problem. The techniques such as decreasing regularization parameter 
A, adding features and polynomial features are the powerful steps, but getting more 
training data will not help much when it is suffering from the high bias problem. 

High Variance (overfitting): In reality, we would want to choose a model with 
low bias and low variance. When a model experience low training error but high 
cross-validation is referred to as high variance problem. Cross-validation error is 
always much bigger than the training error for high variance problem. If a learning 
algorithm is suffering from high variance problem, then getting more training data, 
smaller sets of features, and increasing the regularization parameter are likely to help. 

It is harder to apply this reasoning to SNN, as there are too many parameters 
that might influence the outcome. 

3.2 Automated parameter configuration 

Automated parameter configuration deals with an algorithm whose performance is 
to be optimised under a given domain of parameters. Depending on the behaviour 
of parameters, it determines the nature of the conhguration space so that the target 
algorithm can perform well in the new domain other than in given instance set. Some 
of the standard techniques for automated algorithm conhguration are the exhaustive 
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search, hill climbing search, and genetic algorithms etc. Grid search and random are 
the most commonly used algorithm conhguration strategy. 

Exhaustive Grid Search It exhaustively generates candidates model by choosing 
their parameter values from a grid and evaluate the model for each combination of 
algorithm parameters using cross-validation. 

Random Search Since exhaustive grid searches expensive, there is an alternative 
method called random search comparatively more powerful than grid search in high¬ 
dimensional spaces [BB12]. It selects a random combination of hyperparameter values 
from specifying range, without repetition, and evaluate the models sequentially. 

3.3 ParamILS 

Automated algorithm conhguration tools ParamILS, which is based on the idea of 
iterated local search (ILS) in parameter conhguration, was developed for optimizing 
SAT and MILP solvers by Butter et al [HHS07]. It is a state-of-the-art method for 
parameter tuning and algorithm conhguration which is actively being used to improve 
the variety of application including artihcial intelligence. It can improve parameter 
conhguration of a large and complex model by avoiding an unnecessary run of the 
algorithm. With a given ranges of parameters, ParamILS capable of optimising algo¬ 
rithm in term of minimum error, number of successes and minimizing computational 
resources such as runtime, memory, or communication bandwidth etc. To conhgure 
an algorithm with ParamILS, it will have some characteristics such as parameterized 
algorithm, a domain of parameters, a set of problem instances, and an objective func¬ 
tion. ParamILS perform multiple runs with a diherent combination of the parameter 
from parameter conhguration space that yields the best outcome across the bench- 
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mark dataset. For deterministic algorithms configuration, each run varies with input 
instances but the seed of algorithm is fixed. This feasible combination of parameter 
values depends on some conditional parameters such as: 

cutoff_time define a specific time after which the candidate algorithm will be ter¬ 

minated. 

cutoff_length define a specific run length after which algorithm will be terminated. 

tunerTimeout refer validation of the final best found parameter configuration. 
Automated algorithm configuration tool make a decision by addressing the following 
choice with the lowest cost. 

1. Cutoff time for each run? 

2. Number of runs for each instance? 

3. Which problem instances have to consider for evaluating each parameter configu¬ 
rations? 

4. Which parameter configurations should be evaluated? 

A comparison is made between previous optimised algorithm which is reffed to as 
target algorithm and candidate algorithm whose performance has to be optimized. 
Adaptive capping approaches has been used in ParamILS for selecting the number 
of problem instances and to determine the cutoff time while ILS use for searching 
parameter configuration space. 

ILS method 

iterated local search (ILS) generates a sequence of local optima by employing repeated 
random trials to get better one. The way of searching the local neighbourhood and 
acceptance criterion to decide whether to keep or reject a newly obtained candidate 
solution is as follows [HZHS13]. 

1. Initial solution: generate an initial solution. 
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2. Local search: get another solution perturbation to escape from local optima. 

3. Perturbation: random move in higher order neighbourhood. 

4. Acceptance criterion: compare and choose a better solution with minimum cost. 

This iteration process is repeated until met the termination condition. ParamILS 
follows the above procedure to hud substantially improved parameter conhgurations. 
It generates initial solution with a default and random initialization. A subsidiary 
local search procedure is employed at random with changing only one parameter at a 
time. After accepting a parameter conhguration, it reinitializes the parameter setting 
with a given probability. 

Adaptive capping method 

ParamILS evalutes parameter conhguration pairwise with the same number of run on 
same instances and seeds. Without adaptive capping, this evaluation process takes a 
long period of time. It is capable of avoiding an unnecessary run of the algorithm. Let 
evaluate a pairwise ( ©i and © 2 ) parameter conhguration process on same number 
of instances {N = 100) to see how it avoid unnecessary run. Suppose ©1 needs 10ms 
and ©2 needs 50ms to process 100 instances. Processing time per instance for ©1 is 
10/100 = 0.1ms and for ©2 is 0/100 = 0.5ms. ParamILS compute the lower bound of 
cost function after each run. As the runtime of ©1 is less than © 2 , it will reject ©2 after 
10ms. ParamILS compare parameter conhguration with a bounded evaluation period 
and evaluate all parameter conhgurations and select the best one with probability 
arbitrarily close to one. When the lower bound over the bounded time, it skips the 
rest run. 
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Chapter 4 


Configuring Neural Networks 


The main focus of this work is on configuring neural network training algorithms, in 
particular SpikeProp. First, we use automated parameter configuration (ParamILS) 
described in the previous chapter to see how much an accuracy can be improved by 
optimizing the parameters of learning algorithms. Then, we look at the interplay 
between different parameter settings for spiking neural networks, in particular initial 
weight range, as well as membrane time potential r and number of delays d, and 
analyse the datasets to see why the common heuristic for setting target times seems 
to give good results. 

4.1 General Methodology 

We used ParamILS to configure two supervised learning algorithms, backpropagation 
(with and without glia) and SpikeProp. In both cases, we used neural networks 
with one hidden layer. The optimization was done with respect to the percentage of 
correctly classified instances in test data. We supplied a range for each parameter, in 
particular for the number of hidden layer nodes, learning rate, number of epochs and 
a variety of spiking neural network parameters. 
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4.1.1 Datasets 


We used the following publicly available datasets, for which previous accuracy data 
for SNN classihcation was available. 

Wisconsin breast cancer dataset: Wisconsin breast cancer dataset (WBC) con¬ 
tains 699 samples, divided into benign and malignant class with 16 cases of missing 
value. Remaining 683 samples contain 444 benign and 239 malignant type data. Each 
sample contains 11 attributes including an identihcation number, input attributes, and 
class. First attribute considered as an identihcation number, middle 9 attributes as 
an input variable and last attribute as a class. 

Iris dataset: Iris dataset contains 150 samples having 3 classes and each sample 
has 4 attributes. 

Ionosphere dataset: The 351 instances of ionosphere dataset were divided into 
175 training and 176 testing instances. The input layer consisted of 34 neurons, 
each neuron corresponded to one attribute and the output layer had two neurons, 
representing “good” and “bad” 

4.1.2 Cross Validation 

A model performance is determined by how well the target function from training data 
of that model generalizes to new data. Though a learning algorithm hts on training 
set well, it might fail to generalize new data. Cross-validation actually can help to 
pick parameter that is going to be best. It is the most commonly used parameter 
tuning method which also helps to avoid a lot of testing. Generally, a randomly 
sorted dataset is divided into the training set, validation set and testing set. 
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In our experiments, we generated 10 permutations of each dataset. We used 5 of 
them as ParamILS training datasets (that is, for model parameter optimization), and 
5 for model parameter testing (i.e., as ParamILS’s testing data). For each individual 
experiment, the dataset was split in half, with the hrst half used for training, and 
the second half for testing. In this way, the actual training was done on different 
set of samples for each of the 10 permutations of datasets, and testing done on the 
remaining samples. 

4.2 Optimizing ANNs and ANGN 

The network model ANGN inspired by the work done by Zahra Sajedinia [Sajl4] 
is an extension of ANN where a new processing element called glia astrocytes was 
added to the network. It was tested with two benchmark dataset (Ionosphere and 
Wisconsin breast cancer). Backpropagation learning algorithm was used for training 
both ANN and ANGN network. We have worked on both ANNs and ANGNs network 
and conhgured a number of parameters that are responsible for varying accuracy. 
Gomparison table (see Table: 6.1) demonstrate that conhgured network outperform 
typical network. 

ANNs 

Gonhgured parameters value for ANNs such as Learning rate a, number of neuron 
at hidden layer m, number of epoch e and others non-conhgurable parameters such 
as number of input neuron n and number of output neuron o has been shown in the 
table: 6.2. 
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Dataset 

ANN ( [Sajl4]) 

Configured ANN 

ANGN ( [Sajl4]) 

Configured ANGN 

WBC 

87% 

96% 

91% 

97% 

Ionosphere 

80% 

90% 

86% 

96% 


Table 4.1: Comparison result between ANNs and ANGNs 


Dataset 

n 

m 

0 

a 

e 

accuracy 

WBC 

9 

17 

2 

0.03 

100 

96% 

Ionosphere 

34 

18 

2 

0.35 

700 

90% 


Table 4.2: ANNs simulation parameters 


ANGNs 

List of configured parameters of ANGN such as learning rate a, number of neurons at 
hidden layer m, number of cycles to activate glia k, Weight increments rate a, weight 
decrements rate b, and glia threshold 6 has been shown in table 6.3. 


4.3 Optimizing SNN with SpikeProp 

In this section, we present results of our experiments with ParamILS-optimized Spike- 
Prop. 


Dataset 

n 

m 

o 

a 

e 

a 

b 

k 

e 

accuracy 

WBC 

9 

10 

2 

0.45 

100 

1.25 

0.55 

80 

0.95 

97% 

Ionosphere 

34 

10 

2 

0.1 

1000 

0.7 

0.75 

0.35 

0.65 

96% 


Table 4.3: ANGNs simulation parameters 
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4.3.1 SpikeProp implementation 

In our experiment, we use discrete time slots like Bohte et al [BKLP02]. As datasets we 
used to have comparable ranges of inputs, we did not encode inputs with the Gaussian 
receptive held, but fed direct input to the network; that made for a smaller network. 
We used 1 output neuron for all experiments, differentiating between classes by target 
times. Coding interval is dehned by the range of data. All experiments were done 
on computer with conhguration:Intel (R) Xeon (R) CPU X5550 @ 2.67GHz, x86_64, 
32-bit, 64-bit CPU op_mode (s), 4 core (s) per socket, 16 CPU (s), 6 CPU family 
and 26 model. 

To select target times, we compute potential earliest and latest output spike times, 
then a different time slot is assigned for each class in between earliest and latest spike 
time using ParamILS. Spiking Neural Network model builds upon code for XOR from 
GitHub [mba]. * 

WBC Network: consist of an input layer, 1 hidden layer and output layer with 1 
output neuron. Gaussian receptive held or square cosine encoder is not necessary for 
breast cancer data as it has the same value range from 1 to 10. There are 9 input 
variables directly encoded to network and 2 diherent time slots are assigned for output 
spike times corresponded to benign and malignant class. 

Iris Network: consists of an input layer with 4-encoding neurons, 1 hidden layer, 
and an output layer with the 1-output neuron. Input variables are directly encoded to 
network same as WBC network and three diherent time slots are assigned for output 
spike times corresponded to setosa, versicolor, and virginica class. 

A comparison of two benchmark dataset with the result reported by Bothe et 

*We had to make a number of changes to make this code work, in particular setting initial weight 
and target time. Latest version of our work is available to github [Mus] 
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Dataset 

n 

m 

0 

a 

d 

e 

r 

WBC 

9 

9 

1 

0.01 

16 

800 

10 

Iris 

4 

12 

1 

0.05 

14 

100 

9 


Table 4.4: SNN simulation parameters 



Testing (Iris ) 

Testing (WBC) 

SpikeProp ( [BKLP02]) 

96.1% ± 0.1 

97.0% ± 0.6 

Configured SpikeProp 

98.1% ± 0.2 

98.5%± 0.8 


Table 4.5: Comparison between original SpikeProp paper and SpikeProp we config¬ 
ured using ParamILS 

al. [BKLP02] has been shown in table 4.5 and simulation setup for our experiment 
has been reported in table 4.4. All result are based on average 100 independent runs 
with a cutoff time 5.0. 

Evaluating both outcomes as shown in table 4.5, it can be seen that our experiment 
outperform original SpikeProp in term of accuracy and smaller network architecture. 

4.4 Investigating parameters in SNN configuration 

In this section we will closely observe how output spike time related to other parame¬ 
ters such as a number of hidden neurons n, a number of delay terminals dk, threshold 
voltage V, synaptic weights Wijk, coding interval AT and membrane time constant r. 
For example, it was shown that a r needs to be larger than the relevant AT [BKLP02], 
but it is now clear by how much it needs to be larger for better accuracy. As SNNs 
have more configurable parameter (such as membrane time constant r, threshold 6, 
target spike time tf, number of synapses dk, increment rate of delay, weight range, 
and coding interval AT) than ANN, training SNN is more complicated than ANN. 
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4.4.1 Weight bounds 

Setting initial weight of network have a signihcant impact to enable the network to 
converge rapidly. If a neuron does not hre at all, this neuron will have no contribution 
to entire training process [MLB06]. If the weights are too small, neurons might never 
hre. On the other hand, if the weights are too large neurons will hre earlier than 
expected and the network will not consistently converge. That’s why it is important 
to set initial weight in such a way so that each neuron hre at least once. An approach to 
set weight range was proposed in [MLB06] However, it requires knowing target times. 
Instead, we set weights based on number of previous layer neurons n^, maximum 
number of delay path d, tmax = t + d, and the time step tmin depend on data set. For 
Iris dataset tmin = 0.1 and for Wisconsin breast cancer dataset tmin = 1- If actually 
depends on value changing step of input data. 

With that, we use the following formulas for computing minimum and maximum 
weights. 


Minimum Weight 


^min 


TV 

drii 


3I (tniax /T) 


Maximum Weight 


^max 


TV 

drii 


(4.1) 


(4.2) 


Weight is initialized by random value in range minimum and maximum weight asso¬ 
ciated with minimum weight. 

weight = Wmin + random{wmin, Wmax, d) (4.3) 
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4.4.2 Threshold, delays and membrane time constant r 

Membrane time constant (r) have to be chosen in such way so that there will have a 
common range of bring time for all input values of the same class. A maximum number 
of delays (synapses) d and r are not independent. The formulas for determining 
minimum and maximum weight in equations 4.1,4.2 goes reverse direction depending 
on train and tmax- To avoid minimum weight getting bigger than maximum weight, 
we need to balance values for time step and d: decrease time step and increase d. We 
looked at changing the delay increment, as well, but did not get conclusive results. 
Increasing the delay increment did not seem to affect the behavior of the network and 
varying the data set did not seem to affect this behavior. 

Threshold does not make an effect on spike firing time. Because we use a threshold 
to set weight range. It just becomes a multiplicative factor on both sides inequality 
for checking if the membrane potential exceeds threshold. Therefore, it is possible to 
set threshold to 1 without affecting the performance. 

There seems to be an interesting interplay between r and the number of delays d 
with respect to accuracy. In our experiments, the isolines of the same accuracy on the 
3D graphs with r and d on x and y axes look essentially linear. That is, increasing 
d and r proportionally to each other seems to keep the accuracy of the network the 
same. For a given r, as d increases, the performance increases as well. However, in 
all our experiments there was another cut-off when the value of d became too large 

More specifically, here are the accuracy landscapes for our datasets with respect 
to r and d. Each point in this landscape is an average of 30 runs for the given values 
of r and d. For each of the isolines, we used Matlab cftool to fit a line to describe 
that isoline as d = piT + p 2 - 
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Membrane Time Constant 


Figure 4.1: Accuracy as a function of r and d for WBC dataset 

Membrane time constant r vs. number of delays d in WBC dataset 

Accuracy for classifying WBC benchmark datasets with respect to delay and r has 
been ploted in hgure 4.1. 

The good performance area is with d above the line between yellow and green (see 
hgure 4.2), which is a drop in accuracy from 0.92 to 0.86. By using MATLab cftool 
to determine the parameters of this line, we obtain the following description of this 
isoline: d = 1.583r + 1.861. According to cftool, the SSE of this ht is 1.806, and 
adjusted i?-square is 0.9064. See hgure 4.4A for the plot of the data and the line. 
This isoline denotes the bottom boundary for the good accuracy region: accuracy 
drops sharply when d is below 1.583r + 1.861. 

Then, there is a milder cutoh as d grows too large with respect to r, and the 
accuracy diminishes to below 0.98. The line (see hgure 4.4B ) corresponding to this 
cutoh is dehned hj d = 12.95r + 0.8637 (adjusted i7-square: 0.9786). 

If we zoom in on the good performance region (see hgure 4.3), we will see a 
distinct "ridge" of the best performance, which also seems to follow a linear pattern 


43 









Membrane Time Constant 


Figure 4.2: Yellow part of area accuracy of at least 97% and the rest set to zero of 
WBC data 

More specifically, the points with accuracy > 0.99 form a line (see hgure 4.4C) d = 
1.714r + 6.619 (adjusted i?-square: 0.9554). Thus, there is an optimal region of 
accuracy, bordered by isolines of d as a function of r. 


44 





Membrane Time Constant 


2A 

22 

Delay” 


Figure 4.3: Zoom in picture on high accuracy region of WBC data 
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Figure 4.4: Fitted lines for borders of the good accuracy region and the "ridge of the 
best accuracy" in WBC 
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Iris dataset: r vs. d 


Similarly, performance variation in term of number of delays d and r has been shown in 
hgure 4.5 and 4.6 for Iris data set, as well as a permuted Iris dataset, respectively. As 
for WBC, there seems to be a line where accuracy is maximized for the permuted Iris 
dataset: namely, d = 0.1593r + 16.31. This behaviour is fairly resilient to permuting 
the dataset, as seen on hgure 4.8. 

Iris original dataset seems to have a platen of best accuracy, bordered by a line 
d = 0.9558 * T — 0.2052 below, and d = 2.464 * t — 4.107 above. Similarly, for 
permuted Iris dataset there is a good accuracy plateau (with a faint "best accuracy 
ridge” close to the top border, see hgures 4.7 and 4.9), bordered from above by a line 
d = 2.464 * T — 4.107, and from below by a line d = 0.9558 *t — 0.2052. 



Figure 4.5: Accuracy as a function of r and d for original Iris dataset 
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gure 4.6: 


Accuracy as a function of r and d for permuted Iris dataset 



Figure 4.7: Permuted Iris dataset zoom in 
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Iris best performance region boundaries 



Figure 4.8: Boundaries of best performance region for Iris data set, original (blue and 
red lines) and permuted (orange and purple lines) 



Figure 4.9: Region for permuted Iris dataset with accuracy > 99 
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4.4.3 Selecting Target Times 


Correctly choosing target time is the biggest challenge for training spiking neural 
network. There is not much explanation in existing work for setting target times. For 
example standard XOR take inputs 0 for 0, 6 for 1 and targets are 10 and 16 without 
having a specihc hypothesis. Automated algorithm conhguration tools "ParamILS" 
has been used in our approach for selecting target time. First and last output spike 
time of network is determined which is used as a range of target time. First output 
spike time and last output spike time of network mainly depend on maximum weight 
and minimum connection weight respectively. 

Earliest Spike Time: 

The hrst spike depends on maximum weight and smallest input while latest spike 
time depends on minimum weight and largest input. To determine hrst bring time 
at hidden layer let us consider smallest input 0 and maximum weight between input 
layer and hidden layer Wimax- 

dmax 

V < TliWimax ^“ 0 - dk)e{t - 0 - 4) (4.4) 

k=l 

Here, H{s) denotes heavy-side step function, H{s) = 0 for s < 0; H{s) = 1 for s >= 0. 
The hrst bring time ti at a hidden layer is considered as smallest input at output layer 
and maximum weight Whmax between hidden and output layer for determining hrst 
bring time at the output layer. 

dmax 

V < UhWhmax ^“ dk)e{t - U - dk) (4.5) 

k=l 
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Latest Spike Time: 


Latest spike time of network is determined as like earliest spike time with considering 
minimum weight instead of taking the maximum weight. The latest hring time at 
hidden layer is calculated by the following equation. 

dmax 

V < UiWimin Hit - 0 - dk)eit - 0 - 4) (4.6) 

fc=i 

Similarly, the latest hring time at output layer is: 

f^max 

^ — ^h^hmin ^ ) Hit 4 ) (4T) 

k=l 

4.4.4 Target times 

When we hrst encountered the heuristic of choosing target times by taking the average 
in each class of the outputs of a randomly intialized network, it seemed a surprisingly 
ad hoc approach. However, our experiments with ParamILS seem to indicate that 
it actually works fairly well for the types of dataset we used. More specihcally, even 
though the classes in WBC and Iris are not quite linearly separable, the instances 
from different classes look, on average, quite distinct. 

Consider the following hgures, which show a such a simple metric as a sum of 
input values for each class. From hgure 4.10, it is clear that the range of this sum 
for the malignant class is mostly from 30 up, with an average around 55, whereas 
in benign class very few instances have values above 25, and none above 50. More 
precisely, the sum of the values in malignant class is from 22 to 88, while in benign 
class the sums range from 5 to 49. That is, it is natural to expect that on a randomly 
trained network the average output spike in the benign data will be much earlier than 
in malignant data. And indeed, ParamILS came up with values 7 and 8 for target 
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spikes for benign and malignant data, respectively. 



Fig: (A) Malignant Data Range Fig: (B) Benign Data Range 

Figure 4.10: Sums of inputs for malignant and benign classes of Wisconsin Breast 
Cancer data 


Similarly, for Iris dataset the classes have different average sums of inputs (see 
hgure 4.11), though here the overlap between Versicolor and Virginica is more signif¬ 
icant. For this dataset, ParamILS gave target values of 4 for Setosa, 5 for Versicolor 
and 6 for Virginica. 

Input data range of Iris data is [0.1,7.9], while for WBC it is [1,10]. 



Setosa Data Range Versicolor Data Range Verginica Data Range 

Figure 4.11: Sums of inputs for Setosa, Versicolor and Virginica classes in Iris dataset 
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Chapter 5 


Conclusion 


In this project, we have looked at conhguring artihcial neural network training algo¬ 
rithms, in particular the spiking neural network training algorithm SpikeProp due to 
Bohte [BKLP02]. We have shown that using automated algorithm conhguration tools 
it is possible to achieve better accuracy than reported in the literature. 

That raises the question of comparing the quality of different algorithms or even 
different modihcations of the same algorithm: how can we be reasonably sure that the 
performance increase is due to the modihcation in question, as opposed to a tweak 
to parameters? We think that using automated algorithm conhguration tools would 
allow for a more fair comparison of algorithms, and, though it does not constitute a 
formal proof, it would alleviate some of such concerns. 

There is a number of directions that we are currently investigating. So far, we used 
the simplest possible encoding of the input: assigning a number to a corresponding 
discrete time slot in the coding interval. The next step would be to compare the 
direct encoding of the input with a more complex way such as Gaussian or cosine 
receptive helds encodings. These encodings come with their own sets of parameters 
to be optimized together with the rest of the network parameters, which we would do 
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using ParamILS. 

Another project would be to change the way the output of the network is repre¬ 
sented. In particular, rather than having a single output neuron and match classes to 
spike times, we would like to look at the winner-takes-all representation, where there 
is an output neuron for each class, and the output neuron that hres hrst is considered 
to be the answer. Using this representation would allow us to sidestep the question 
of target times, potentially just training for the earliest-possible versus latest-possible 
spike. So far, though, we were not able to achieve good results using this output 
representation. 

Overall, we would like to understand the question of target times better. Ideally, 
given a data set, we would like to be able to analyse it and derive the desired target 
times just from the statistical properties of the datasets (there, we would need to 
calculate the other parameters, particularly r and d, simultaneously, as the target 
times depend on these parameters). It would be good to understand spiking neural 
networks, at least of the delayed type used with SpikeProp, enough to derive formulas 
for setting the parameters of a training algorithms, instead of relying on heuristics. 

Finally, we would like to return to the project from which we started: augmenting 
spiking neural networks with glia, in the style of [Sajl4]. This would create artihcial 
networks of cells of different types and functions, making them more biologically 
realistic. Additionally, we would like to leverage the other types of improvements 
in modern artihcial neural networks, and work with more complex datasets. This 
thesis makes a step towards understanding how to conhgure and compare the existing 
algorithms, and gives us ways to make such future comparisons more meaningful. 
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